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Abstract. Program generation and transformation systems manipulate large, pa- 
rameterized object language fragments. Support for user-definable concrete syn- 
tax makes this easier but is typically restricted to certain object and meta lan- 
guages. We show how Prolog can be retrofitted with concrete syntax and describe 
how a seamless interaction of concrete syntax fragments with an existing ‘legacy” 
meta-programming system based on abstract syntax is achieved. We apply the ap- 
proach to gradually migrate the schemas of the AutoBayes program synthesis 
system to concrete syntax. First experiences show that this can result in a con- 
siderable reduction of the code size and an improved readability of the code. In 
particular, abstracting out fresh-variable generation and second-order term con- 
struction allows the formulation of larger continuous fr agments and improves the 
“locality” in the schemas. 


1 Introduction 

Program generation and transformation systems work on two language levels, the ob- 
ject level (i.e., the language of the manipulated programs), and the meta level (i.e., 
the implementation language of the system itself). Conceptually, these two levels are 
unrelated but in practice they have to be interfaced with each other. Often, the object 
language is simply embedded within the meta language, using an abstract data type to 
represent the abstract syntax trees of the object language. The actual implementation 
mechanisms (e.g., records, objects, or algebraic data types) may vary but embeddings 
can be used with arbitrary meta languages and make their full programming capabil- 
ities immediately available for program manipulations. Meta-level representations of 
object-level program fragments are then built in an essentially asyntactic fashion using 
the operations provided by the data type. 

However, syntax matters. The conceptual distance between the concrete programs 
that we understand and their meta-level representations that we need to use grows with 
the complexity of the object language syntax and the size of the represented program 
fragments, and the use of abstract syntax becomes less and less satisfactory. Languages 
like Prolog and Haskell allow a rudimentary integration of concrete syntax via user- 
defined operators. However, this is usually restricted to simple precedence grammars 



so that realistic object languages cannot be represented well if at all Traditionally, a 
quotation/anti-quotation mechanism is thus used to interface languages: a quotation 
denotes an object-level fragment, an anti-quotation denotes the result of a meta-level 
computation which is spliced into the object-level fragment. If object language and 
meta language coincide, the switch between the then purely conceptual language levels 
is easy and a single compiler can be used to process them both. If the object language 
is user-definable, the mechanism becomes more complicated to implement and usually 
requires specialized meta languages such as ASF+SDF [6], Maude [5], or TXL [4] 
which support syntax definition and reflection. 

In this paper, we follow a slightly different path. We describe the first experiences 
with our ongoing work on adding support for user-definable concrete syntax to Auto- 
Bayes [9,7], a large, schema-based program synthesis system implemented in Prolog. 
We follow the general approach outlined in [15], which allows the extension of an 
arbitrary meta language with concrete object language syntax by merging the syntax 
definitions of both languages. We show how the approach is instantiated for Prolog 
and describe the processing steps required for a seamless interaction of concrete syntax 
fragments with the remaining “legacy” meta-programming system based on abstract 
syntax — despite all its idiosyncrasies. 

The original motivation for this specific path was purely pragmatic. We wanted to 
realize the benefits of concrete syntax without forcing the disruptive migration of the en- 
tire system to a different meta-programming language. Retrofitting Prolog with support 
for concrete syntax allows a gradual migration. Our long-term goal, however, is more 
ambitious: we want to support domain experts in creating and maintaining schemas. We 
expect that the use of concrete syntax makes it easier to gradually “schematize” exist- 
ing domain programs. We also plan to use different grammars to describe programs on 
different levels of abstraction and thus to support domain engineering. 

2 Overview of the AutoBayes-System 

AUTOB ayes is a fully automatic program synthesis system for data analysis problems. 

It has been used to derive programs for applications like the analysis of planetary neb- 
ulae images taken by the Hubble space telescope [8] as well as research-level machine 
learning algorithms [1]. It is implemented in SWI-Prolog [16] and currently comprises 
about 64,000 lines of documented code; Figure 1 shows the system architecture. 

AUTOBAYES derives code from a statistical model which describes the expected 
properties of the data in a fully declarative fashion: for each problem variable (i.e., 
observation or parameter), properties and dependencies are specified via probability 
distributions and constraints. The top box in Figure 1 shows the specification of a nebu- 
lae analysis model. The last two clauses are the core of this specification; the remaining 
clauses just declare the model constants and variables, and impose constraints on them. 
The distribution clause 

x(I, J) " gaussCi.0 * exp(-((I-xO)**2+(J-yO)**2)/(2*r**2)) , sigma). 

states that, with an expected error sigma , the expected value of the observation x at 
a given position (i,j) is a function of this position and the nebula's center position 
(x 0 , Vo ), radius r, and overall intensity io,. The task clause 
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Fig. 1. AUTOBayes system architecture. 


max pr(x| {i0,x0,y0, r, sigma}) for {i0,x0,y0,r, sigma} . 

specifies the analysis task the synthesized program has to solve, i.e., to estimate the 
parameter values which maximize the probability of actually observing the given data 
and thus under the given model best explain the observations. In this case, the task can 
be solved by a mean square error imnimization due to the gaussian distribution of the 
data and the specific form of the probability. Note, however, that (i) this is not imme- 
diately clear from the model, (ii) the function to be minimized is not explicitly given 
in the model, and (in) even small modifications of the model may require completely 
different algorithms. 

AutoBayes thus derives the code following a schema-based approach. A pro- 
gram schema consists of a parameterized code fragment (i.e., template) and a set of 
constraints. Code fragments are written in ABER (AutoBayes Intermediate Repre- 
sentation), which is essentially a “sanitized” variant of C (e.g., no pointers nor side 
effects in expressions) but also contains a number of domain-specific constructs (e.g., 
vector/matrix operations, finite sums, and convergence-loops). The parameters are in- 
stantiated either directly by the schema or by AutoBayes calling itself recursively 
with a modified problem. The constraints determine whether a schema is applicable 
and how the parameters can be instantiated. They are formulated as conditions on the 
model, either directly on the specification, or indirectly on a Bayesian network [3] ex- 
tracted from the specification. 

The schemas are organized hiearchically into a schema library. Its top layers con- 
tain decomposition schemas based on independence theorems for Bayesian networks 




which try to break down the problem into independent sub-problems. These are domain- 
specific divide-and-conquer schemas: the emerging sub-problems are fed back into the 
synthesis process and the resulting programs are composed to achieve a solution for the 
original problem. Guided by the network structure, AUTOBayes is thus able to synthe- 
size larger programs by composition of different schemas. The core layer of the library 
contains statistical algorithm schemas as for example expectation maximization (EM) 
[10] and nearest neighbor clustering ; usually, these generate the skeleton of the pro- 
gram. The final layer contains standard numeric optimization methods as for example 
the simplex method or different conjugate gradient methods. These are applied after the 
statistical problem has been transformed into an ordinary numeric optimization problem 
and AUTOBayes failed to find a symbolic solution for that problem. 

The schemas are applied exhaustively until all maximization tasks are rewritten 
into ABIR code. The schemas can explicitly trigger large-scale optimizations which 
take into account information from the synthesis process. For example, all numeric 
optimization routines restructure the goal expression using code motion, common sub- 
expression elimination, and memoization. In a final step, AUTOBayes translates the 
ABIR code into code tailored for a specific run-time environment. Currently, it pro- 
vides code generators for the Octave and Matlab environments; it can also produce 
standalone C and Modula-2 code. The entire synthesis process is supported by a large 
meta-programming kernel which includes the graphical reasoning routines, a symbolic- 
algebraic subsystem based on a rewrite engine, and a symbolic equation solver. 


3 Migrating from Abstract Syntax to Concrete Syntax 

In the existing AUTO BAYES -implementation, schemas are simply Prolog-clauses and 
code fragments are simply Prolog-terms. The excerpt in Figure 2 shows a schema that 
implements (i.e., generates code for) the Nelder-Mead simplex method for numerically 
optimizing a function with respect to a set of variables [11]. The complete schema 
comprises 508 lines of documented Prolog-code, and is fairly typical in most aspects, 
e.g., the size of the overall schema and of the fragment, respectively, the amount of 
meta-programming, or the ratio between the code constructed directly (e.g., Code) and 
recursively (e.g.. Reflection). This schema is also used to generate the algorithm core 
for the nebula specification. 

The excerpt shows why the simple abstract syntax approach quickly becomes cum- 
bersome as the schemas become larger. The code fragment is built up from many 
smaller fragments by the introduction of new meta-variables (e.g.. Loop) because the 
abstract syntax would become unreadable otherwise. However, this makes it harder to 
follow and understand the overall structure of the algorithm. The schema is sprinkled 
with a large number of calls to small meta-programmingpredicates (e.g., model_gensym, 
or indexunake); this makes it harder to write schemas because one needs to know not 
only the abstract syntax, but also a large part of the meta-programming base. The use of 
Prolog's term builder = . . (which is required as Prolog does not support second-order 
patterns) is particularly pervasive because schemas tend to be parameterized with the 
names of object-level data structures. In our experience, these peculiarities make the 



schema (Formula, Vars, Constraint, Code) 

model_gensym (simplex, Simplex), 

SDim = [dim(A_BASE, Sizel), dim(A_BASE, SizeO)], 

SDecl = matrix (Simplex, double, SDim, 

[comment ( [* Simplex data structure: (’, Size, ’+1) ’ , 
’points in the 1 , Size, 

’-dimensional space’])]), 

var_fresh(I) , 
var_fresh(J) , 

index__make ( [I , dim(A__BASE, SizeO)], Index_i) , 
index_make ( [ J , dim(A_BASE, Sizel)], Index_j) , 

Center_i =. . [Center, I], 

Simplex_ji = . . [Simplex, J, I], 

Centroid = 

f or ([Index_i] , 

assign (Centered, sum( [Index_ j] , Simplex,, ji) , []), 

[comment ([’ Calculate the center of gravity in the simplex 5 ])]), 

simplex_try (Formula, Simplex, * . . , 

-1, ’Reflect the simplex from the worst point (F = -1)’, 
Reflection) , 

Loop = while (converging ([...]) , 

series ( [Centroid, Reflection, . ..], □), 

[comment ( * Convergence loop 3 ) ] ) , 

Code = block (local ( [SDecl, ...]), 

series ( [Init, Loop, Copy], []), 

[label (SLabel) , comment (XP) ] ) . 


Fig. 2. AUTOBAYES-schema for the Nelder-Mead simplex method (excerpt) 


learning curve much steeper than it ought to be, which in turn makes it difficult for a 
domain expert to gradually extend the system’s capabilities by adding a single schema. 

In the following, we show how this schema is migrated and refactored, making it 
easier to understand and maintain. The first step is to replace terms representing abstract 
syntax by concrete syntax literals, e.g.. 

Centroid = I [ 

/* Calculate the center of gravity in the simplex */ 
for( Index_i:idx ) 

Center_i := sum( Index_j:idx ) Simplex.. ji : exp 

] I 

Here, we use I [ . . . ] I as quotation operator but leave anti-quotation implicit. Prolog 
(meta-) variables are distinguished by capitalization and can thus be used directly in the 


concrete syntax. A general anti-quotation mechanism is not required since Prolog is a 
relational language and the result of a meta-computation is not uniquely determined — 
any number of variables can be instantiated as a result. In a few places, the meta- 
variables are tagged with their syntactic category, e.g.. Index J. : idx. This allows the 
parser to resolve ambiguities and to introduce the injection functions necessary to build 
well-formed syntax trees. 

The next step inlines the indexes, which eliminates the calls of the indexunake 
meta-predicate shown in Figure 2. 

Centroid = I [ 

/* Calculate the center of gravity in the simplex */ 
for( I := A_BASE . . SizeO ) 

Center^i := sum( J AJ3ASE .. Sizel ) Simplex_ji :exp 

31 

Incidentally, this also eliminates the need for the tags because the syntactic category is 
now determined by the source text. Next, the array-references are inlined, thus elimi- 
nating the = « . -constructors. 


Centroid = I [ 

/* Calculate the center of gravity in the simplex */ 
for( I := A_BASE . . SizeO ) 

Center[lj := sum( J := A_BASE .. Sizel ) Simplex[J, I] 

II 

Finally, the object variables are tagged with @new; this is a special anti-quotation oper- 
ator which constructs fresh object-level variable names. 1 

Centroid - | [ 

/* Calculate the center of gravity in the simplex */ 
for( IQnew := A_BASE .. SizeO.) 

Center [I] sum( JOnew := A_BASE . . Sizel ) Simplex [J, I] 

31 

Here, the use of concrete syntax and the @new reduces the overall size by approximately 
30% and eliminates the need for any explicit meta-programming. The reduction ratio is 
more or less maintained over the entire schema. After migration along the lines above 
(i.e., replacing the individual code fragments by concrete syntax and inlining the results 
at their use sites), the schema size is reduced from 508 lines to 366 lines. 2 At the same 
time, the resulting fewer but larger code fragments give a better insight into the structure 
of the generated code. 

1 In ABIR, index variables are declared implicitly, so that the construction of an explicit decla- 
ration is not required. 

2 Comparing lines of code is a rather imprecise measurement. After white space removal, the 
original schema has 7779 characters and the resulting schema with concrete syntax 5538, 
confirming a reduction of 30 % in actual code size. 



4 Embedding Concrete Syntax into Prolog 

The extension of Prolog with concrete syntax as sketched in the previous section is 
achieved using the syntax definition formalism SDF2 [13,2] and the transformation lan- 
guage Stratego [12,14] following the approach described in [15]. SDF is used to specify 
the syntax of ABIR and Prolog as well as the embedding of ABIR into Prolog. Strat- 
ego is used to transform syntax trees over this combined language into a pure Prolog 
program. Apart from the application to Prolog, this work extends [15] with additional 
transformations on the embedded object code in order to produce code compatible with 
the legacy AutoBayes system and to support abstractions for second-order variables 
and fresh variable generation. In this section we give an overview of the components of 
concrete-pl, the transformation system mapping Prolog with concrete syntax to pure 
Prolog. 


4.1 Combining Syntax Definitions 

The extension of a me ta- language with concrete object syntax requires an embedding 
of the syntax of object code fragments as expressions in the meta-language. We thus 
created syntax definitions of Prolog and ABIR using SDF. Since SDF is a modular 
syntax definition formalism, combining languages is simply a matter of importing the 
appropriate modules, as illustrated by the following excerpt from the embedding of 
ABIR into Prolog: 

module PrologABIR 
imports Prolog ABIR 
exports 

context-free syntax 

"If” Exp "J | " -> PrologTerm {cons ("ToTerm") , prefer} 

"1C" Stat -> PrologTerm {cons ( "ToTerm" ) , prefer} 

variables 

[A-Z] [A-Za-z0-9_] * -> Id {prefer} 

[A-Z] [A-Za-z0-9_] * ":exp" -> Exp 

This module allows us to use ABIR Expressions and Statemements as Prolog terms, by 
quoting them with the | [ and ] I delimiters. The variables section declares schemas 
for meta-variables. Thus, a capitalized identifier can be used as a meta-variable for 
identifiers, and a capitalized identifier tagged with : exp can be used as a meta- variable 
for expressions. As mentioned above, a general antiquotation mechanism other than the 
inclusion of meta- variables is not required since Prolog is a relational language and the 
result of a meta-computation is not uniquely. determined. 


4.2 Exploding Embedded Abstract Syntax 

After parsing a schema with the combined syntax definition the resulting abstract syntax 
tree is a mixture of Prolog and ABIR abstract syntax. For example, the Prolog-goal 


Code = | [ X := Y + z ] J 


is parsed into the abstract syntax tree 


bodygoal (infix (var ("Code") , op(symbol( M =") ) , 

toterm ( assign (var (meta-var ("X 1 ’) ) , 

plus (met a- var ( " Y : exp" , var ( M z ")))))) ) 

The language transitions are characterized by the toterm-constructor, and meta- variables 
are indicated by the met a- var-constructor. Thus, bodygoal and inf ix belong to Pro- 
log abstract syntax, while assign, var and plus belong to ABIR abstract syntax. A 
mixed syntax tree can be translated to a pure Prolog tree by “exploding” embedded tree 
constructors to functor applications: 

bodygoaKinf ix( var ("code") , op (symbol ( "=") ) , 
func (functor (word ("as sign ") ) , 

[func (functor (word ("var”)) , [var ("X")] ) , 
func (functor (word ("plus")) , 

[var ("y") , 

func (functor (wordC'var")) , 

[atom ( quo tedname (" ) z i "))])])])) ) 

After pretty-printing this tree we get the pure Prolog-goal 

Code = assignCvar (X) , plus(Y, varOz’))) 

Note how the meta- variables X and Y have become Prolog variables representing a vari- 
able name and an expression, respectively, while the object variable z has become a 
character literal. 

Explosion is defined generically using transformations on mixed syntax trees, i.e., it 
is independent from the object language. The basic transformation is rewriting functor 
applications in ABIR abstract syntax to functor applications in Prolog abstract syntax. 
This is expressed by the following Stratego transformation rule: 

TrmOp : Cp#(Tsl) -> func (functor (word (<lowsr- case > op) ) , Ts2) 
where <map(trm-expl ode )> Tsl — > Ts2 

Several other rules deal with special constructs such as meta- variables and lists. Rewrit- 
ing the final Centroid-fragmenton page 6 then produces the pure Prolog-goal 


Centroid = 
commented ( 

comment ( [> Calculate the center of gravity in the simplex ’l), 
for (indexlist ( [index (newvar (I) , var(A_BASE) , var (SizeO))] ) , 
assign ( array sub (Center , [var (I)] ) , 

sum ( indexlist ( [index (newvar(J) , 

var (AJ3ASE) ,var (Sizel) )] ) , 
call (Simplex, [var(J) , var (I)] ) ) ) ) ) 



4.3 Custom Abstract Syntax 

Comparing the generated Centroid-goal above with the original in Figure 2 shows 
that the abstract syntax underlying the concrete syntax fragments does not correspond 
exactly to the original abstract syntax used in AutoBayes. The latter version is less 
precise, but more compact, since it was designed for direct use. In order to interface 
schemas written in concrete syntax with legacy components of the synthesis system, 
additional transformations are thus applied to the Prolog code, which translate between 
the two versions of the abstract syntax. For the Centroid-fragment this produces: 

Centroid = 

f or ( [idx (newvar (I) , A JBASE , SizeO) ] , 
assign (arraysub (Center , [I] ) , 

snm( [idx(newvar (J) ,A_BASE,Sizel)] , call (Simplex, [J,I]))) , 
[comment ([* Calculate the center of gravity in the simplex *])]) 


4.4 Lifting Predicates 

In AutoBayes, array accesses are represented by means of functor applications and 
object variable names are generated by gensym-predicates. This cannot be expressed 
in a plain Prolog term. Thus arraysub s and calls are hoisted out of abstract syntax 
terms and turned into term constructors and fresh variable generators as follows: 

var_fresh(I) , _a =. . [Center, I], var^fresh(J) , _b =. . [Simplex, J, I] , 
Centroid = 

for([idx(I, A_BASE, SizeO)], 

assign(_a, sum([idx(J, A_BASE, Sizel)] , _b)), 

[comment ( [ * Calculate the center of gravity in the simplex 'l)]) 

Hence, the embedded concrete syntax is transformed exactly into the form needed to 
interface it with the legacy system. 

5 Conclusions 

Syntax matters. Program generation and transformation systems manipulate large, pa- 
rameterized object language fragments. Operating on such fragments using abstract- 
syntax trees or string-based concrete syntax is possible, but has severe limitations in 
maintainability and expressive power. Any serious program generator should thus pro- 
vide support for concrete object syntax together with the underlying abstract syntax. 
Our work on extending the AutoBayes synthesis system with concrete syntax shows 
that this can result in a considerable reduction in the code size, but more importantly, an 
improved readability of the code. In particular, abstracting out fresh-variable generation 
and second-order term construction allows the formulation of larger continuous frag- 
ments and improves the ‘locality” in the schemas. The goal of having domain experts 
write meta-programs with concrete syntax depends on more than just concrete syntax. 
There are more aspects that make meta-programming hard. However, concrete syntax 
should make it easier rather than more difficult, we expect. Further evaluation of the 
approach involving domain experts should make clear whether this goal is achievable. 


The contribution of this work is twofold. In the first place, we describe an extension 
of Prolog with concrete object syntax, which is a useful tool for all meta-programming 
systems using Prolog. The concrete^pl tool that implements the mapping back into 
pure Prolog is available as a general tool for embedding object languages in Prolog. 3 
In the second place, we demonstrate that the approach of [15] can indeed be applied 
to other meta languages than Stratego, and we extend it with object-language-specific 
transformations to achieve the integration with a legacy systems. This allows a gradual 
migration of existing systems, even if they were originally designed without support for 
concrete syntax in mind. 

The concrete-pl tool is independent of the embedded object language. However, 
the transformations applied to the Prolog-code after explosion are specific to the im- 
plementation of the ABIR embedding. This aspect should be generalized in order to 
support arbitrary object languages. This mainly requires factoring out the postprocess- 
ing transformations on the Prolog-code and make these a parameter of the tool. 
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